Finite mixture models are typically inconsistent for the number of components

نویسنده

  • Diana Cai
چکیده

A generative model is, of necessity, a vast simplification of the deeply complex real-world phenomena that govern any observed data set. It is only via this simplification that we can arrive at a tractable data analysis and discover meaningful and actionable patterns in data. In this sense, typically any model of a real-world data set is misspecified, and misspecification is unavoidable. But while misspecification in the form of simplification is powerful, it can also be potentially dangerous. In particular, certain kinds of misspecification can lead to fundamentally inaccurate or misleading inferences. For instance, recent work by Miller and Harrison (2013, 2014) serves as a cautionary tale about mixture modeling. In particular, mixture models are often matched with a nonparametric Bayesian prior by practitioners in order to discover the number of clusters in a set of data. Miller and Harrison (2013, 2014) demonstrate that such models are “severely” inconsistent for the number of clusters; that is, the probability of the correct number of clusters being recovered decreases to zero as the amount of data increases. An implication is that finite mixture models would be a more appropriate modeling choice. But empirical work by Miller and Dunson (2015) suggests otherwise. We here aim to demonstrate theoretically that even finite mixture models with an unknown number of clusters generally exhibit severe inconsistency, just as the nonparametric Bayesian models do. We discuss the implications for practical modeling and inference in mixture models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model Selection for Mixture Models Using Perfect Sample

We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...

متن کامل

Determination of the number of components in finite mixture distribution with Skew-t-Normal components

Abstract One of the main goal in the mixture distributions is to determine the number of components. There are different methods for determination the number of components, for example, Greedy-EM algorithm which is based on adding a new component to the model until satisfied the best number of components. The second method is based on maximum entropy and finally the third method is based on non...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

The Negative Binomial Distribution Efficiency in Finite Mixture of Semi-parametric Generalized Linear Models

Introduction Selection the appropriate statistical model for the response variable is one of the most important problem in the finite mixture of generalized linear models. One of the distributions which it has a problem in a finite mixture of semi-parametric generalized statistical models, is the Poisson distribution. In this paper, to overcome over dispersion and computational burden, finite ...

متن کامل

Performance Evaluation of Magnetorheological Damper Valve Configurations Using Finite Element Method

The main purpose of this paper is to study various configurations of a magnetorheological (MR) damper valve and to evaluate their performance indices typically dynamic range, valve ratio, inductive time constant and pressure drop. It is known that these performance indices (PI) of the damper depend upon the magnetic circuit design of the valve. Hence, nine valve configurations are considered fo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017